Robust Rhymes? The Stability of Authorial Style in Medieval Narratives

نویسندگان

  • Mike Kestemont
  • Walter Daelemans
  • Dominiek Sandra
چکیده

We explore the application of stylometric methods developed for modern texts to rhymed medieval narratives (Jacob van Maerlant and Lodewijk van Velthem, ca. 1260–1330). Because of the peculiarities of medieval text transmission, we propose to use highly frequent rhyme words for authorship attribution. First, we shall demonstrate that these offer important benefits, being relatively content-independent and well-spread over texts. Subsequent experimentation shows that correspondence analyses can indeed detect authorial differences using highly frequent rhyme words. Finally, we demonstrate for Maerlant’s oeuvre that this highly frequent rhyme words’ stylistic stability should not be exaggerated since their distribution significantly correlates with the internal structure of that oeuvre. STYLE AND AUTHORSHIP Most statistically or computationally supported research into authorship attribution is nowadays style-based, convinced ‘that by measuring some textual features we can distinguish between texts written by different authors’ (Stamatatos, 2009, p. 538). The basic assumption of such stylometric research is consequently that each author has a unique set of linguistic characteristics or a ‘stylome’ (Van Halteren, Baayen, Tweedie, Haverkort & Neijt, 2005) that can be quantitatively distinguished from *Address correspondence to: Mike Kestemont, University of Antwerp, City campus, Prinsstraat 13, Room D. 118, 2000 Antwerp, Belgium. Tel.: þ 32 3/220 42 54. Email: [email protected] Journal of Quantitative Linguistics 2012, Volume 19, Number 1, pp. 54–76 http://dx.doi.org/10.1080/09296174.2012.638796 0929-6174/12/19010054 2012 Taylor & Francis D ow nl oa de d by [ U ni ve rs ite it A nt w er pe n] a t 0 2: 31 1 3 M ar ch 2 01 2 any other author’s style. In this paper we shall focus on the possibilities of style-based authorship attribution for medieval narratives. Although authorship attribution has many relevant applications in the domain of historical studies, it is rarely applied to pre-modern data. In our case study we shall investigate a corpus of rhymed narratives (by Jacob van Maerlant and Lodewijk van Velthem) from the medieval Low Countries (ca. 1260–1330). An innovative aspect of this research is that it is restricted to rhyme words, instead of plain words as is common in present-day authorship attribution, because plain words are typically vulnerable to corruption by medieval scribes. For present-day authors, highly frequent words have engendered a lot of scholarly interest, since they would contain reliable indications about authorship. In the first part of this paper, we shall therefore assess whether highly frequent rhyme words could be suited for medieval authorship attribution. Note that many contemporary authorship studies tacitly assume that an author’s stylome remains relatively constant over time as well as across different texts, topics and text varieties. This supposition has however been challenged (Rudman, 1998). It has been doubted whether an author’s style is necessarily constant (Holmes, 1998; Forsyth, 1999; Juola, 2007; Argamon, 2008; Stamatatos, 2009; Luyckx & Daelemans, 2011). In the final part of this paper, we will therefore attempt to determine to what extent the distribution of highly frequent rhyme words is affected by non-authorship related factors in the large oeuvre of a single medieval author. AUTHORSHIP ATTRIBUTION AND MEDIEVAL LITERATURE Although computational authorship attribution is surrounded by a lively discussion elsewhere, the discussion is nearly absent in medieval philology (500–1500), which is remarkable. One characteristic of medieval data is namely its problematic survival. For a variety of reasons (e.g. fires) a lot of important resources such as manuscripts have not survived or only in a severe state of damage. Therefore, scholars often lack meta-data on their texts: if a manuscript survives fragmentarily, it is often difficult to determine when or where it was produced. As far as authorship is concerned, we often possess the least information in medieval texts. Many texts are of unknown or disputed authorship and their attribution – to known authors or to the authors of other THE STABILITY OF MEDIEVAL AUTHORIAL STYLE 55 D ow nl oa de d by [ U ni ve rs ite it A nt w er pe n] a t 0 2: 31 1 3 M ar ch 2 01 2 anonymous texts – is therefore an important issue. Because of the particular transmission of medieval texts, authorship attribution for them is, however, anything but straightforward. Before the advent of the printing press in Western Europe all copies of a particular work were manually produced by scribes (Salemans, 2000). Many medieval manuscripts that survive nowadays are in fact copies (of copies) of the original author’s text; the original ‘autographs’ have rarely survived. Manual copying was an error-prone activity, so that scribes unwillingly introduced mistakes in a copy, ‘corrupting’ the authorial text (Roos & Heikkilä, 2009). No standard spelling or language existed, so that spelling was phonological, reflecting a scribe’s personal dialect or regional spelling habits (Kestemont, Daelemans & De Pauw, 2010). Apparently, scribes saw no difficulties in adapting their exemplar’s spelling and language and with each copy a text risked an increased deviation from the original (Spencer & Howe, 2001). Below is an example (Table 1) of how one line from the Rijmbijbel, one of the texts dealt with below, survives from a series of parallel manuscripts (Kestemont & Van Dalen-Oskam, 2009). Note how scribes have introduced subtle variations in the text or sometimes even changed the wording. Recent research has shown that the influence of scribes might be even larger than previously assumed (Van Dalen-Oskam & Van Zundert, 2007). The alterations that scribes introduced tend to be systematic and often move beyond mere innocent spelling or dialectal adaptations. In a number of case studies, it has been shown that medieval scribes had a ‘style’ of their own (Kestemont & Van Dalen-Oskam, 2009). Apparently, Table 1. An example of the variation in medieval text transmission: one line from the Rijmbijbel shows variant readings in a series of parallel manuscripts. Manuscript Variant reading for 1 line from Rijmbijbel (‘On that moment and the same time’) D Ter stont ende ter seluer vren E Tier stont ende ter seluer vren F TIere stont enter seluer vren G Tottien stonden en ter uren H TEn stonden ende ter seluer vren I Tjerst stont ende tier veren J Tyer stont ende tier seluer vren N TJer stont tier seluer vre 56 M. KESTEMONT ET AL. D ow nl oa de d by [ U ni ve rs ite it A nt w er pe n] a t 0 2: 31 1 3 M ar ch 2 01 2 scribes enjoyed a large freedom in adapting texts, even to such an extent that their appropriation of texts can be modeled. This raises the issue to what extent the stylistic traits of an original author are preserved in subsequent copies. The strong impact of scribes suggests that the features that are traditionally used in authorship attribution might be of questionable relevance for medieval texts, since these are likely to contain markers for scribal, rather than authorial identity (Kestemont, 2010b). One interesting and practical bypass for this problem has been suggested: rhyme words (Besamusca, 2003). Throughout the Middle Ages a good deal of the literature was rhymed, an acoustic quality of texts that was of course important for a semi-literate culture, in which literature was received through oral recitation rather than silent reading. For instance in the medieval Low Countries, the rhymed couplet was the preferred verse form for most of the narrative literature until well into the late medieval period (Lie, 1994). The rhymed couplet (aabbccdd . . .) was often used to structure medieval epics of a larger size. What is interesting is that rhyme words tend to be a stable element in medieval text transmission, very robust to scribal corruption (Kestemont, 2010b). Scribes generally refrained from manipulating the underlying rhyme words of a text (Besamusca, 2003), as is also clear from Table 1. It is of course cumbersome to try and change the rhyme words of a text, if one is not to rewrite a considerable piece of it. Even if scribes did change the spelling of rhyme words, the underlying lexemes were often left untouched. It therefore makes sense to apply stylometric methods to the lexemes (lemmas) of words in rhyme position, since these are likely to contain non-contaminated indications about the original authorship of texts. The number of possible rhyme word combinations in a language is moreover limited: authors were bound to often recycle rhyme words. It is not inconceivable that they would display individual predilections for a subset of these words, and use them as ‘stopgaps’ or ‘mnemonics’, once they had proven useful. Frequently recurring rhyme words could therefore function as the ‘fingerprint’ of an author. The objective of this paper is therefore to study these rhyme words by means of a representative case study and determine whether it is feasible to apply stylometric methods to them. Note that we shall study the use of rhyme words in isolation from the combinations they appear in (e.g. pairs in the case of the couplet). Although rhyme combinations could contain THE STABILITY OF MEDIEVAL AUTHORIAL STYLE 57 D ow nl oa de d by [ U ni ve rs ite it A nt w er pe n] a t 0 2: 31 1 3 M ar ch 2 01 2 markers of authorial style too, they fall outside the scope of the present study. CASE-STUDY: MAERLANT AND VELTHEM Our case study is taken from the medieval Low Countries and focuses on the surviving works of two medieval Dutch authors (Jacob van Maerlant and Lodewijk van Velthem). Jacob van Maerlant (ca. 1240 ca. 1300) was undoubtedly one of the most influential authors of the medieval Low Countries – one medieval poet called him the ‘founding father of all poets who wrote in Dutch’ (Van Oostrom, 1996). Maerlant has left us an extensive oeuvre of narrative texts. The following schema (Table 2) introduces the texts included in our corpus. Our corpus is described in detail at the end of this paper. In this schema, the texts have been ranked according to their date of composition, suggested by the current state of the art in the research field (Van Oostrom, 1996). Some specific problems have to be taken into account. Both M2 and M3 survive in two unique manuscripts and for both of them there are indications that they might be heavily corrupted by the compilers of these manuscripts (Besamusca, Sleiderink & Warnar, 2009). It will therefore have to be determined whether these versions Table 2. An overview of Maerlant’s works included in the corpus. Full Middle Dutch title Abbreviation Text variety Alexanders Geesten (‘The deeds of Alexander the Great’) M1 Chivalric Historie van den Grale (‘The history of the holy grail’) M2 Chivalric Roman van Torec (‘The romance of the knight Torec’) M3 Chivalric Historie van Troyen (‘The history of Troy’) M4 Chivalric Heimelijkheid der Heimelijkheden (‘The secret of secrets’) M5 Ethic-didactic Der naturen bloeme (‘The best of nature’) M6 Ethic-didactic Rijmbijbel (‘The rhyming Bible’) M7 Historiographical Sinte Franciscus leven (‘The life of Saint Francis’) M8 Historiographical Spiegel historiael (Derde Partie) (‘The mirror of history’, third part) M9 Historiographical 58 M. KESTEMONT ET AL. D ow nl oa de d by [ U ni ve rs ite it A nt w er pe n] a t 0 2: 31 1 3 M ar ch 2 01 2 sufficiently preserve the original style of the author. In M4 Maerlant included a text known to be written by another author, Segher Diengotgaf (Kestemont, 2010a). Because of the unclear status of this interpolation, we have excluded this part of M4 from our corpus. Regarding M5, it has been doubted whether Maerlant has actually written it – his name only appears in two of the three surviving manuscripts – and moreover its date of composition is sometimes doubted (Van Oostrom, 1996). Although recent studies do not seem to doubt this anymore, this issue needs special attention. The technique of authorship attribution especially lends itself for addressing it. It should be noted that the size of Maerlant’s oeuvre is fairly large for a medieval author. Selecting a contemporary oeuvre that parallels Maerlant’s is not without problems. Lodewijk van Velthem (beginning of the fourteenth century) seems the safest option. Velthem was a great admirer of Maerlant, even to such extent that he has been characterized the executor of Maerlant’s literary testament (Van Oostrom, 1996). Only two works survive that can be attributed without any doubt to Velthem (Table 3). Both works are continuations of two of Maerlant’s works: V1 is, for instance, the sequel to M9 (see Table 2). V2 firmly builds upon M2 and is in fact extant from the same unique manuscript. The lack of an independent tradition for M2 has sometimes raised the question to what extent Velthem altered Maerlant’s original text (Besamusca, Sleiderink & Warnar, 2009). Note that the same is true for M3: it survives in a single manuscript, probably compiled by Velthem. Especially in this case, it is often claimed that Velthem might have manipulated Maerlant’s M3 to a large extent. Because of the literary proximity between these authors, this corpus serves as a good test case for authorship attribution. Our hypothesis is that, if rhyme words can serve as reliable indicators of authorship, it should be possible to distinguish the texts containing the typical rhyme word ‘fingerprint’ that each author has left on them. The rhyme words in Table 3. An overview of Velthem’s works included in the corpus. Full title Abbreviation Text variety Spiegel historiael (Vierde en Vijfde Partie) V1 Historiography Merlijn-continuatie V2 Chivalric THE STABILITY OF MEDIEVAL AUTHORIAL STYLE 59 D ow nl oa de d by [ U ni ve rs ite it A nt w er pe n] a t 0 2: 31 1 3 M ar ch 2 01 2 this corpus have been tokenized and lemmatized (Kestemont et al., 2010). All the experiments below are restricted to these lemma tags (i.e. the underlying lexemes), in order to abstract away from superficial spelling variation introduced by scribes. The schema in Table 4 presents some general facts about the lemmatized versions of the texts and rhyme words in this corpus. AUTHORSHIP AND HIGH-FREQUENCY ITEMS Although additional linguistic characteristics (such as syntax or character n-grams) are widely studied, lexical features remain popular in authorship attribution studies (Stamatatos, 2009). In these studies, text samples are represented as vectors, consisting of a fixed number of parameters indicating the normalized frequencies of a set of words. The main advantages of lexical features are that (a) their performance is generally acceptable; (b) their extraction generally requires little linguistic preprocessing (except tokenization); and (c) their stylistic relevance is often easy to interpret. Note that authorship attribution based on lexical features often does benefit from the inclusion of additional (e.g. syntaxbased) feature types but that these other features types in isolation rarely outperform lexical features (Van Halteren et al., 2005; Luyckx & Daelemans, 2011). One feature type that generally does outperform Table 4. General information about the rhyme words in the texts in the corpus. Text Number of lemma tokens Number of distinct lemma types (per text) Number of hapaxes (types) M1 14.237 1790 128 M2 8.601 1224 76 M3 3.854 832 30 M4 38.391 2652 383 M5 2.156 793 40 M6 16.672 2298 409 M7 34.708 2454 327 M8 10.494 1461 105 M9 31.08

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Incorporating Authorial Intent into Generative Narrative Systems

One of the major themes to emerge in interactive narrative research is authorability and authorial intent. With interactive narratives, the human author is not present at run-time. Thus authoring interactive narratives is often a process of anticipating user actions in different contexts and using computational mechanisms and data structures for responding to the participant. Generative approac...

متن کامل

Medieval anxieties: translation and authorial self-representation in the vernacular beast fable

My dissertation examines the concept of vernacular translation in the Middle Ages, particularly examining French and Middle English texts. It focuses on a specific genre of literature popular in the Middle Ages but relatively ignored in contemporary literary scholarship: the beast fable. My argument is that some of the principal writers of vernacular fables from the twelfth through the fifteent...

متن کامل

Predicting the Past: Memory Based Copyist and Author Discrimination in Medieval Epics

In this paper we will focus on the scribal variation in manually copied medieval texts. Using a lazy machine learning technique, we will argue that it is possible to discriminate between scribes, implying that they did adapt texts when copying them. Consequently, we will assess to what extent scribal interventions compromise our ability to detect the original authorship of medieval texts. It wi...

متن کامل

Using Automated Rhyme Detection to Characterize Rhyming Style in Rap Music

Imperfect and internal rhymes are two important features in rap music previously ignored in the music information retrieval literature. We developed a method of scoring potential rhymes using a probabilistic model based on phoneme frequencies in rap lyrics. We used this scoring scheme to automatically identify internal and line-final rhymes in song lyrics and demonstrated the performance of thi...

متن کامل

Passivity-Based Stability Analysis and Robust Practical Stabilization of Nonlinear Affine Systems with Non-vanishing Perturbations

This paper presents some analyses about the robust practical stability of a class of nonlinear affine systems in the presence of non-vanishing perturbations based on the passivity concept. The given analyses confirm the robust passivity property of the perturbed nonlinear systems in a certain region. Moreover, robust control laws are designed to guarantee the practical stability of the perturbe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Quantitative Linguistics

دوره 19  شماره 

صفحات  -

تاریخ انتشار 2012